Clean directory and load packages
The objective of this notebooks is to update the extraction list for our collaborative PWS Pink Salmon whole genome re-sequencing (WGR) project with the Christie Lab at Purdue University. The goal of this notebook is 2-fold:
This project is using leftover Pink Salmon Disaster 2016 funds to try to address questions about potential genetic mechanisms causing reduced RRS. This project is broken into two main questions:
The study design for this extraction list comes from:
V:\Documents\5_Coastwide\Multispecies\AHRP\Pink Salmon Disaster Funding\Round2\Objective 11 PWS WGR\Sample Units.xlsx
Sheet 2
Screenshot of sample design
Using “output/PWS Pink Salmon WGR Extraction List.xlsx” and “output/extraction_selection.xlsx” to hand-pick most extractions
Read in the original and revised P076 Wells tables from LOKI iStrategy (internal ADF&G GCL database).
(wells_original <- readr::read_csv(file = "../data/P076 for Sequencing_Wells_original_bad.csv"))
Rows: 1039 Columns: 6── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Silly Code, Well, Tissue Type
dbl (3): Lab Project ID, Plate ID, Fish
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(wells_revised <- readr::read_csv(file = "../data/P076 for Sequencing_Wells_revised_good.csv"))
Rows: 1039 Columns: 6── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Silly Code, Well, Tissue Type
dbl (3): Lab Project ID, Plate ID, Fish
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Fish that are the 12 fish with missing tissues that we did not end up using.
dplyr::anti_join(x = dplyr::select(.data = wells_original, `Silly Code`, `Fish`), y = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`))
Joining, by = c("Silly Code", "Fish")
Replacement 12 fish that got extracted and sequenced.
dplyr::anti_join(x = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`), y = dplyr::select(.data = wells_original, `Silly Code`, `Fish`))
Joining, by = c("Silly Code", "Fish")
We know that we had 12 fish replacements, but we need to check if these replacements changed the DNA plate map.
This is the original fish and well location info. These are wrong.
dplyr::anti_join(x = wells_original, y = wells_revised)
Joining, by = c("Lab Project ID", "Plate ID", "Silly Code", "Fish", "Well", "Tissue Type")
These is the revised fish and well location info. This is the correct fish and well location data that should be used moving forward.
dplyr::anti_join(x = wells_revised, y = wells_original)
Joining, by = c("Lab Project ID", "Plate ID", "Silly Code", "Fish", "Well", "Tissue Type")
There were 12 fish that were originally selected for DNA extraction that had missing tissues. Those 12 fish were replaced and then prior to DNA extraction, the DNA plate map was re-sorted by Fish ID, so it resulted in 32 fish from collection PSRIN14 getting shifted into different wells. No other collections were affected, everything else is as it should be.
extraction_selection.xlsx individuals tabHow many fish per silly?
Read in all PSPRIN14 extractions
extraction_selection_PSPRIN14_PSPRIN15 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_PSPRIN15.csv") %>%
dplyr::select(-dwp_barcode)
Rows: 72 Columns: 9── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): silly_source, silly, dwp_barcode, tissue_type, sex, otolith_mark_present
dbl (2): fish_id, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
extraction_selection_PSPRIN14_v2 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_v2.csv")
Rows: 12 Columns: 9── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): silly_source, silly, tissue_type, sex, otolith_mark_present
dbl (3): fish_id, dwp_barcode, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
extraction_selection_PSPRIN14_v3 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_v3.csv")
Rows: 8 Columns: 9── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): silly_source, silly, tissue_type, sex, otolith_mark_present
dbl (3): fish_id, dwp_barcode, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(
extraction_PSRPIN14 <-
dplyr::bind_rows(
extraction_selection_PSPRIN14_PSPRIN15,
extraction_selection_PSPRIN14_v2,
extraction_selection_PSPRIN14_v3
)
)
Join up with wells_revised
(
wells_revised %>%
dplyr::filter(`Silly Code` == "PSPRIN14") %>%
dplyr::rename(silly = "Silly Code",
fish_id = "Fish") %>%
dplyr::select(silly, fish_id) %>%
dplyr::left_join(extraction_PSRPIN14, by = c("silly", "fish_id")) %>%
dplyr::select(silly, fish_id, sex, otolith_mark_present, sample_date) %>%
dplyr::mutate(
sex = dplyr::case_when(sex == "M" ~ "male",
sex == "F" ~ "female"),
origin = dplyr::case_when(
otolith_mark_present == "NO" ~ "natural",
otolith_mark_present == "YES" ~ "hatchery",
TRUE ~ otolith_mark_present
)
) %>%
dplyr::select(silly, fish_id, sex, origin, sample_date) %>%
readr::write_csv(file = "../output/extraction_PSPRIN14_revised.csv")
)
Read in all riverdist output.
(
stream_specimens_riverdist_all_streams_2013_2020 <-
readr::read_csv(file = "../data/stream_specimens_riverdist_all_streams_2013_2020.csv")
)
Rows: 235693 Columns: 15── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Sample, ADFGStreamCode, StreamName, SpawningState
dbl (8): riverdist_seg, riverdist_vert, riverdist_snapdist, Latitude, Longitude, mouthdist, hitide, dist2tide
lgl (3): PreSpawn, PartialSpawner, PreyedUpon
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Reformat to get intertidal vs. upstream.
(
location_tidal <-
stream_specimens_riverdist_all_streams_2013_2020 %>%
dplyr::rename(sample = Sample,
distance_tide = dist2tide) %>%
dplyr::mutate(
intertidal = dplyr::case_when(
distance_tide <= 0 ~ "Intertidal",
distance_tide > 0 ~ "Upstream",
is.na(distance_tide) ~ NA_character_
) # create intertidal
) %>%
dplyr::select(sample, intertidal)
)
erb_2017 <-
readr::read_csv(file = "../data/Erb 2017 AHRP Salmon Biological Data 20220321_151027.csv")
Rows: 14955 Columns: 22── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID,...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_M...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMI...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stockdale_2015_2016_hogan_2015_2016 <-
readr::read_csv(file = "../data/Stockdale 2015-2016 Hogan 2015-2016 Spring 2014-2015 AHRP Salmon Biological Data 20220322_115608.csv")
Rows: 80100 Columns: 22── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID,...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_M...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMI...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spring_2014_2015 <-
readr::read_csv(file = "../data/Spring 2014-2015 AHRP Salmon Biological Data 20220322_122538.csv")
Warning: One or more parsing issues, see `problems()` for detailsRows: 25240 Columns: 22── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (8): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_STA...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_M...
lgl (6): OTOLITH_MARK_ID, TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_AR...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(oceanak_og <- dplyr::bind_rows(erb_2017,
stockdale_2015_2016_hogan_2015_2016,
spring_2014_2015)
)
Reformat
(
oceanak <- oceanak_og %>%
dplyr::filter(TISSUE_TYPE == "Heart-bulbus arteriosus") %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
sample_date = SAMPLE_DATE,
sex = SEX,
length_mm_meh = LENGTH_MM,
otolith_mark_present = OTOLITH_MARK_PRESENT,
otolith_mark_id = OTOLITH_MARK_ID
) %>%
dplyr::mutate(
sample_date = lubridate::ymd(sample_date),
hatchery = dplyr::case_when(
!is.na(otolith_mark_id) ~ stringr::str_sub(
string = otolith_mark_id,
start = 1,
end = 3
),
TRUE ~ otolith_mark_id
),
origin = dplyr::case_when(
otolith_mark_present == "NO" ~ "natural",
otolith_mark_present == "YES" ~ "hatchery"
),
sex = dplyr::case_when(
sex == "M" ~ "male",
sex == "F" ~ "female",
sex == "U" ~ "unknown",
TRUE ~ sex
)
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
tidyr::unite(
col = "sample",
c(dwp_barcode, dwp_well),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
sample,
silly,
fish_id,
dwp_barcode,
dwp_well,
sample_date,
sex,
length_mm_meh,
origin,
otolith_mark_present,
otolith_mark_id,
hatchery
)
)
Join in location data to the everything else from the data warehouse.
(paired_data <- oceanak %>%
dplyr::left_join(y = location_tidal, by = "sample"))
Now join this all in with the wells_revised
wells_revised %>%
tidyr::unite(
col = "silly_source",
c("Silly Code", "Fish"),
sep = "_",
remove = FALSE
) %>%
dplyr::left_join(y = paired_data, by = "silly_source") %>%
dplyr::filter(!is.na(sample))
Double check to make sure this is how many fish we should have…
Whew, we did it
extraction_selection.xlsx tab individuals(individuals <- readxl::read_xlsx(path = "../output/extraction_selection.xlsx", sheet = "individuals"))
We are going to replace this tab with a join between wells_revised, the existing individuals tab, and our new paired data
(
individuals_new <- wells_revised %>%
dplyr::select(-`Lab Project ID`) %>%
dplyr::rename(
dna_plate_id = "Plate ID",
silly = "Silly Code",
fish_id = "Fish",
dna_plate_well = "Well",
tissue_type = "Tissue Type"
) %>%
dplyr::select(dna_plate_id,
dna_plate_well,
silly,
fish_id,
tissue_type) %>%
dplyr::left_join(y = individuals, by = c("silly", "fish_id")) %>%
dplyr::left_join(
y = select(.data = paired_data,-c(sex, origin, sample_date)),
by = c("silly", "fish_id")
)
)
Great, write all of this out, paste into extraction_selection_revised_20221014.xlsx as the new individuals tab, annotate the columns, send to Will.
readr::write_csv(x = individuals_new, file = "../output/extraction_selection_revised_20221014.csv")
End…time for a beer
PSPRIN14, DNA plate 63218 strikes again!
After our last go around re-visiting this, William Hemstrom (Mark Christie’s new post-doc from UC Davis) noticed on 11/3/22 that one sample from plate 63218 was incorrect. The issue was well A7 vs. B7. He is seeing B7 as a NTC, whereas in our update we had A7 as the NTC.
Kristen looked at the DNA quant data from the BioTek plate reader and she saw that we had DNA in A7 and C7, but B7 was consistent with an NTC (e-mail on 11/4/22).
Heather and Erin Dooley looked at the plate map again and resolved the error on 11/14/22, B7 is now back to an NTC (jives with Will’s data and Kristen’s quant data).
I just need to double check that only A7 and B7 changed, then send Will the update!
Read in the original and revised P076 Wells tables from LOKI iStrategy (internal ADF&G GCL database).
(wells_original <- readr::read_csv(file = "../data/P076 for Sequencing_Wells_original_bad.csv"))
Rows: 1039 Columns: 6── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Silly Code, Well, Tissue Type
dbl (3): Lab Project ID, Plate ID, Fish
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(wells_revised <- readr::read_csv(file = "../data/P076 for Sequencing_Wells_revised_good.csv"))
Rows: 1039 Columns: 6── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Silly Code, Well, Tissue Type
dbl (3): Lab Project ID, Plate ID, Fish
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(wells_revised_v2 <- readr::read_csv(file = "../data/P076 for Sequencing_Wells_revised_v2_better.csv"))
Rows: 1039 Columns: 6── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Silly Code, Well, Tissue Type
dbl (3): Lab Project ID, Plate ID, Fish
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Are there any differences in the fish?
dplyr::anti_join(
x = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`),
y = dplyr::select(.data = wells_revised_v2, `Silly Code`, `Fish`)
)
Joining, by = c("Silly Code", "Fish")
Yes, PSPRIN14_176 no longer exists (which is as it should be)
Are there any differences in the fish?
dplyr::anti_join(
x = dplyr::select(.data = wells_revised_v2, `Silly Code`, `Fish`),
y = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`)
)
Joining, by = c("Silly Code", "Fish")
And PSPRIN14_175 does exist (which is also as it should be)
Are there any differences in the wells?
dplyr::anti_join(
x = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`, `Plate ID`, `Well`),
y = dplyr::select(.data = wells_revised_v2, `Silly Code`, `Fish`, `Plate ID`, `Well`)
)
Joining, by = c("Silly Code", "Fish", "Plate ID", "Well")
Are there any differences in the wells?
dplyr::anti_join(
x = dplyr::select(.data = wells_revised_v2, `Silly Code`, `Fish`, `Plate ID`, `Well`),
y = dplyr::select(.data = wells_revised, `Silly Code`, `Fish`, `Plate ID`, `Well`)
)
Joining, by = c("Silly Code", "Fish", "Plate ID", "Well")
Okay, cool, all is well now (pun intended). Heather and Erin accidently messed up the plate map for 63218 when they were fixing things last go around. They have caught that error and update the extraction manager, collection manager, and iStrategy for P076. PSPRIN14_176 no longer exists, 63218_B7 is an NTC, and PSPRIN14_175 is in 63218_A7. No other collections were affected, everything else is as it should be.
extraction_selection.xlsx individuals tabHow many fish per silly?
wells_revised_v2 %>%
dplyr::count(`Silly Code`)
Read in all PSPRIN14 extractions
extraction_selection_PSPRIN14_PSPRIN15 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_PSPRIN15.csv") %>%
dplyr::select(-dwp_barcode)
Rows: 72 Columns: 9── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): silly_source, silly, dwp_barcode, tissue_type, sex, otolith_mark_present
dbl (2): fish_id, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
extraction_selection_PSPRIN14_v2 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_v2.csv")
Rows: 12 Columns: 9── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): silly_source, silly, tissue_type, sex, otolith_mark_present
dbl (3): fish_id, dwp_barcode, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
extraction_selection_PSPRIN14_v3 <-
readr::read_csv(file = "../output/extraction_selection_PSPRIN14_v3.csv")
Rows: 8 Columns: 9── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): silly_source, silly, tissue_type, sex, otolith_mark_present
dbl (3): fish_id, dwp_barcode, dwp_well
date (1): sample_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(
extraction_PSRPIN14 <-
dplyr::bind_rows(
extraction_selection_PSPRIN14_PSPRIN15,
extraction_selection_PSPRIN14_v2,
extraction_selection_PSPRIN14_v3
)
)
Join up with wells_revised_v2
(
wells_revised_v2 %>%
dplyr::filter(`Silly Code` == "PSPRIN14") %>%
dplyr::rename(silly = "Silly Code",
fish_id = "Fish") %>%
dplyr::select(silly, fish_id) %>%
dplyr::left_join(extraction_PSRPIN14, by = c("silly", "fish_id")) %>%
dplyr::select(silly, fish_id, sex, otolith_mark_present, sample_date) %>%
dplyr::mutate(
sex = dplyr::case_when(sex == "M" ~ "male",
sex == "F" ~ "female"),
origin = dplyr::case_when(
otolith_mark_present == "NO" ~ "natural",
otolith_mark_present == "YES" ~ "hatchery",
TRUE ~ otolith_mark_present
)
) %>%
dplyr::select(silly, fish_id, sex, origin, sample_date) %>%
readr::write_csv(file = "../output/extraction_PSPRIN14_revised_v2.csv")
)
Read in all riverdist output.
(
stream_specimens_riverdist_all_streams_2013_2020 <-
readr::read_csv(file = "../data/stream_specimens_riverdist_all_streams_2013_2020.csv")
)
Rows: 235693 Columns: 15── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Sample, ADFGStreamCode, StreamName, SpawningState
dbl (8): riverdist_seg, riverdist_vert, riverdist_snapdist, Latitude, Longitude, mouthdist, hitide, dist2tide
lgl (3): PreSpawn, PartialSpawner, PreyedUpon
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Reformat to get intertidal vs. upstream.
(
location_tidal <-
stream_specimens_riverdist_all_streams_2013_2020 %>%
dplyr::rename(sample = Sample,
distance_tide = dist2tide) %>%
dplyr::mutate(
intertidal = dplyr::case_when(
distance_tide <= 0 ~ "Intertidal",
distance_tide > 0 ~ "Upstream",
is.na(distance_tide) ~ NA_character_
) # create intertidal
) %>%
dplyr::select(sample, intertidal)
)
erb_2017 <-
readr::read_csv(file = "../data/Erb 2017 AHRP Salmon Biological Data 20220321_151027.csv")
Rows: 14955 Columns: 22── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID, OTOLITH_MARK...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_ONE_...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMINATION_COLLEC...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stockdale_2015_2016_hogan_2015_2016 <-
readr::read_csv(file = "../data/Stockdale 2015-2016 Hogan 2015-2016 Spring 2014-2015 AHRP Salmon Biological Data 20220322_115608.csv")
Rows: 80100 Columns: 22── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_ID, OTOLITH_MARK...
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_ONE_...
lgl (5): TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DETERMINATION_COLLEC...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spring_2014_2015 <-
readr::read_csv(file = "../data/Spring 2014-2015 AHRP Salmon Biological Data 20220322_122538.csv")
Warning: One or more parsing issues, see `problems()` for detailsRows: 25240 Columns: 22── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (8): SILLY_CODE, SEX, TISSUE_TYPE, DNA_TRAY_CODE, LOCATION_CODE, SAMPLE_ID, OTOLITH_MARK_PRESENT, OTOLITH_MARK_STATUS_CODE
dbl (7): COLLECTION_ID, FISH_ID, LENGTH_MM, DNA_TRAY_WELL_CODE, SAMPLE_YEAR, IS_MISSING_PAIRED_DATA_EXISTS, WELL_HAS_MORE_THAN_ONE_...
lgl (6): OTOLITH_MARK_ID, TARGET_DNA_TRAY_CODE, TARGET_DNA_TRAY_WELL_POS, TARGET_CONTAINER_ARRAY_TYPE_ID, CONTAINER_ARRAY_TYPE, DET...
dttm (1): SAMPLE_DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(oceanak_og <- dplyr::bind_rows(erb_2017,
stockdale_2015_2016_hogan_2015_2016,
spring_2014_2015)
)
Confirm the absence of length data for PSPRIN14_175
Reformat
(
oceanak <- oceanak_og %>%
dplyr::filter(TISSUE_TYPE == "Heart-bulbus arteriosus") %>%
dplyr::rename(
silly = SILLY_CODE,
fish_id = FISH_ID,
tissue_type = TISSUE_TYPE,
dwp_barcode = DNA_TRAY_CODE,
dwp_well = DNA_TRAY_WELL_CODE,
sample_date = SAMPLE_DATE,
sex = SEX,
length_mm_meh = LENGTH_MM,
otolith_mark_present = OTOLITH_MARK_PRESENT,
otolith_mark_id = OTOLITH_MARK_ID
) %>%
dplyr::mutate(
sample_date = lubridate::ymd(sample_date),
hatchery = dplyr::case_when(
!is.na(otolith_mark_id) ~ stringr::str_sub(
string = otolith_mark_id,
start = 1,
end = 3
),
TRUE ~ otolith_mark_id
),
origin = dplyr::case_when(
otolith_mark_present == "NO" ~ "natural",
otolith_mark_present == "YES" ~ "hatchery"
),
sex = dplyr::case_when(
sex == "M" ~ "male",
sex == "F" ~ "female",
sex == "U" ~ "unknown",
TRUE ~ sex
)
) %>%
tidyr::unite(
col = "silly_source",
c(silly, fish_id),
sep = "_",
remove = FALSE
) %>%
tidyr::unite(
col = "sample",
c(dwp_barcode, dwp_well),
sep = "_",
remove = FALSE
) %>%
dplyr::select(
silly_source,
sample,
silly,
fish_id,
dwp_barcode,
dwp_well,
sample_date,
sex,
length_mm_meh,
origin,
otolith_mark_present,
otolith_mark_id,
hatchery
)
)
Join in location data to the everything else from the data warehouse.
(paired_data <- oceanak %>%
dplyr::left_join(y = location_tidal, by = "sample"))
Now join this all in with the wells_revised_v2
wells_revised_v2 %>%
tidyr::unite(
col = "silly_source",
c("Silly Code", "Fish"),
sep = "_",
remove = FALSE
) %>%
dplyr::left_join(y = paired_data, by = "silly_source") %>%
dplyr::filter(!is.na(sample))
Double check to make sure this is how many fish we should have…
wells_revised_v2 %>%
dplyr::filter(`Silly Code` %in% c("PERB17", "PHOGAN15", "PHOGAN16", "PSPRIN14", "PSPRIN15", "PSTOCK15", "PSTOCK16")) %>%
dplyr::count(`Silly Code`)
Whew, we did it
extraction_selection.xlsx tab individuals(
individuals <-
readxl::read_xlsx(path = "../output/extraction_selection_revised_20221014.xlsx", sheet = "individuals") %>%
dplyr::select(silly, fish_id, sex, origin, date, comment) %>%
dplyr::mutate(date = lubridate::as_date(date))
)
Warning: 86 failed to parse.
Stopped here, need to fix this 11/15/22! paired_data is NOT joining, for whatever esoteric reason.
We are going to replace this tab with a join between wells_revised, the existing individuals tab, and our new paired data
(
individuals_new <- wells_revised_v2 %>%
dplyr::select(-`Lab Project ID`) %>%
dplyr::rename(
dna_plate_id = "Plate ID",
silly = "Silly Code",
fish_id = "Fish",
dna_plate_well = "Well",
tissue_type = "Tissue Type"
) %>%
dplyr::select(dna_plate_id,
dna_plate_well,
silly,
fish_id,
tissue_type) %>%
tidyr::unite(
col = "silly_source",
c("silly", "fish_id"),
sep = "_",
remove = FALSE
) %>%
dplyr::left_join(y = individuals, by = c("silly", "fish_id")) %>%
dplyr::left_join(
y = paired_data,
by = "silly_source",
suffix = c("_indiv", "_paired")
)
)
Double check our hommie in 63218_A7
individuals_new %>%
dplyr::filter(dna_plate_id == 63218)
Great, write all of this out, paste into extraction_selection_revised_20221115.xlsx as the new individuals tab, annotate the columns, send to Will.
readr::write_csv(x = individuals_new, file = "../output/extraction_selection_revised_20221115.csv")
End…time for a beer